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High genetic diversity of East Asian village dogs has recently been 
used to argue for an East Asian origin of the domestic dog. However, 
global village dog genetic diversity and the extent to which semiferal 
village dogs represent distinct, indigenous populations instead of 
admixtures of various dog breeds has not been quantified. Under- 
standing these issues is critical to properly reconstructing the timing, 
number, and locations of dog domestication. To address these ques- 
tions, we sampled 318 village dogs from 7 regions in Egypt, Uganda, 
and Namibia, measuring genetic diversity >680 bp of the mitochon- 
drial D-loop, 300 SNPs, and 89 microsatellite markers. We also ana- 
lyzed breed dogs, including putatively African breeds (Afghan 
hounds, Basenjis, Pharaoh hounds, Rhodesian ridgebacks, and Sa- 
lukis), Puerto Rican street dogs, and mixed breed dogs from the 
United States. Village dogs from most African regions appear genet- 
ically distinct from non-native breed and mixed-breed dogs, although 
some individuals cluster genetically with Puerto Rican dogs or United 
States breed mixes instead of with neighboring village dogs. Thus, 
African village dogs are a mosaic of indigenous dogs descended from 
early migrants to Africa, and non-native, breed-admixed individuals. 
Among putatively African breeds, Pharaoh hounds, and Rhodesian 
ridgebacks clustered with non-native rather than indigenous African 
dogs, suggesting they have predominantly non-African origins. Sur- 
prisingly, we find similar mtDNA haplotype diversity in African and 
East Asian village dogs, potentially calling into question the hypoth- 
esis of an East Asian origin for dog domestication. 
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| nmany respects, dogs have a unique relationship to humans. They 
were the first domesticated species, serve as valuable companions 
and service animals, and have been bred to exhibit more phenotypic 
diversity than any other mammal (1-3). Dogs were probably 
domesticated from Eurasian wolves at least 15,000—40,000 years 
ago (4-6), although the process by which domestication took place, 
including the specific selected traits and the manner in which 
selection was performed, is very poorly understood (7, 8). 

After domestication somewhere in Eurasia, dogs quickly spread 
throughout the continent and into Africa, Oceania and the Amer- 
icas (9). These early dogs, like modern day “village dogs” (7), almost 
certainly lived as human commensals that were not subject to the 
same degree of intense artificial selection and closed breeding 
practices that characterize modern dog breeds. Like ancient human 
populations, these ancient dog populations developed genetic sig- 
natures characteristic of their geographic locale. These signatures 
would persist in both modern day village dog populations that 
descend from these ancient populations and in dog breeds that were 
founded from them. We refer to such dogs as “indigenous” in the 
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sense that they carry characteristic genetic signatures appropriate 
for their geographic region. 

Today, semiferal village dogs are nearly ubiquitous around human 
settlements in much of the world, and such animals comprise a large 
proportion of the global dog population (7). However, the popularity of 
modern breeds has led to the widespread transport of mostly European- 
derived breed dogs into many areas containing village dogs, so it is likely 
that many modern village dogs are not derived solely from indigenous 
ancestors. We refer to village dogs that descend from these foreign dogs 
as “non-native” and expect that genetic markers can differentiate these 
village dogs from indigenous dogs. We believe most of these dogs will 
be complex mixtures of several non-native breeds and/or mixtures of 
both non-native breeds and indigenous village dogs (“intermediate” 
ancestry). 

The distinction between indigenous and non-native dogs is 
important because indigenous, but not non-native, village dogs are 
likely to contain genetic variants that are not found in any of today’s 
>400 recognized dog breeds. Furthermore, they are expected to be 
more informative regarding dog population history and are likely to 
be more adapted to local environmental conditions and more 
genetically related to the first prebreed domestic dogs than breed 
or breed-admixed individuals. To our knowledge, the degree to 
which village dogs consist of indigenous versus non-native individ- 
uals has not been quantified. 

In one of the most comprehensive surveys of village and breed 
dogs to date, Savolainen et al. (6) examined mtDNA diversity in a 
global panel of 654 dogs. Their results confirmed previous mtDNA 
evidence of dog domestication from Eurasian wolves (5), showed 
that East Asian dogs had the highest mtDNA diversity of any 
region, suggesting an East Asian origin of domestication. However, 
subsequent work by Pires et al. (10) has shown that mtDNA does 
not show significant population structure in village dogs. Because 
Savolainen et al. included many East Asian village dogs but few 
village dogs from other regions, their conclusion of high levels of 
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Fig. 1. Map of village dog sampling locations. Colors denote each distinct 
region and dots show approximate range of sampling within each region. See 
Table S1 for full description. 


East Asian diversity is likely a consequence of high levels of 
mitochondrial diversity in village dogs and not necessarily an 
indication of East Asian domestication. 

Other genetic markers have been shown to exhibit significant 
population structure in village dogs. Microsatellites and MHC types 
both separate Bali street dogs from New Guinea singing dogs, 
dingoes, and breed dogs (11, 12). Both studies demonstrated high 
diversity in the Bali dogs, consistent either with an indigenous, 
prebreed ancestry or with a complex admixture history from a large 
number of breeds. Therefore, given a large enough sample of village 
and breed dogs, microsatellite and single nucleotide polymorphism 
(SNP) markers seem well suited to studying population structure 
and the possibility breed admixture in village dogs. 

In this study, we analyzed mtDNA, microsatellite, and SNP 
markers in 318 African village dogs to characterize population 
structure and genetic diversity. In addition, we analyzed 16 Puerto 
Rican street dogs, 102 known mixed-breed dogs from the United 
States, and several hundred dogs from 126 breeds, including 129 
dogs from five African and Middle Eastern breeds, to determine the 
degree of non-native admixture in African village dogs. Our sam- 
pling effort concentrated on seven regions from three geographi- 
cally separated African countries (Fig. 1): Egypt: We sampled three 
distinct locales: a Giza animal shelter, a Luxor animal shelter and 
surrounds, and a rural desert oasis (Kharga). Although the geo- 
graphic distance between Giza and Luxor is greater than that 
between Kharga and Luxor, we hypothesized that the desert would 
be a strong barrier to gene flow, making the latter populations more 
genetically distinct. 

Uganda: We sampled >100 dogs from a cluster of villages east 
of Kampala and 30 dogs from three neighboring isles of the Kome 
Island group in Lake Victoria. Despite the islands being close to 
each other and the mainland (<20 km), we expected the lake might 
act as a dispersal barrier. 

Namibia: We sampled from over a dozen villages and urban areas 
in the northern and central parts of the country. No natural 
dispersal barriers existed between sampling locations, although a 
cordon fence is maintained to keep livestock diseases out of the 
southern part of the country. Dogs are permitted to be taken across 
the cordon and likely have little difficulty getting through the fence 
themselves, but the cordon is significant in that it demarcates the 
extent of European colonization influence in the country [with 
southern and central Namibia colonization history being roughly 
similar to that of South Africa while northern Namibia resembles 
the rest of sub-Saharan Africa (13)]. We sampled dogs within 100 
km of both sides of the cordon, including populations within 10-20 
km of the barrier. 

For comparison, we also sampled from two shelters in Puerto 
Rico, known mixed-breed dogs (see Methods) from the United 
States, and dogs from 126 breeds, including five African and 
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Fig. 2. STRUCTURE analysis across 389 SNP and microsatellite loci in African 
village and American mixed breed dogs. 


near-African breeds (putative origin in parentheses): Afghan 
hounds (Sinai, Egypt), Basenjis (Congo), Pharaoh hounds (near 
Mediterranean), Rhodesian ridgebacks (Zimbabwe), and Salukis 


(Iraq). 


Results 


Inference of Population Structure and Degree of Breed Admixture in 
African Village Dogs. A subset of 223 unrelated African village dogs 
from seven African locales were typed on a panel of 89 microsat- 
ellite markers or 300 SNP markers (206 village dogs, 15 Puerto 
Rican dogs, and two United States mixed-breed dogs were typed on 
both panels). Using the Bayesian clustering program STRUC- 
TURE (14), we found that Puerto Rican street dogs clustered with 
the mixed-breed dogs from the United States, indicating these dogs 
are all breed admixtures. STRUCTURE analysis at K = 5 consis- 
tently showed the same five groupings: Egyptian dogs, Ugandan 
mainland dogs, Kome Island dogs, Northern Namibian dogs, and 
admixed dogs (including all Puerto Rican and U.S. dogs, nearly all 
Central Namibian dogs, and a few other African village dogs; Fig. 
2 and Fig. $1). At K = 4, STRUCTURE clustered Ugandan dogs 
together (mainland and Kome Islands), and at K >5, STRUC- 
TURE subdivided Ugandan dogs further, although these clusters 
were inconsistent (Fig. S2). 

We quantified admixture in each village dog as the mean 
proportion of the genome assigned to the American (United 
States + Puerto Rico) cluster by STRUCTURE across 10 runs 
at K = 5 (admixture estimates using K = 4 or K = 6 mean 
proportions were nearly identical; R? = 0.984 and 0.992, 
respectively). In total, 84% of African village dogs outside of 
central Namibia showed little or no evidence of non-native 
admixture (estimated admixture proportion <25% in 152 of 
181 dogs), whereas all central Namibian dogs had >25% 
admixture, and most had >60% (24 of 25; Table 1). Principal 
component analysis showed a clear separation of Egyptian 
from sub-Saharan populations in PC1 and separation between 


Table 1. Number of indigenous (<25% inferred admixture), 
uncertain (25%-60% inferred admixture) and breed admixed 
(>60% inferred admixture) village dogs by region from the 223 
unrelated genotyped dogs 


country region indigenous uncertain admixed 
Egypt Giza 7 4 0 
Egypt Luxor 25 0 0 
Egypt Kharga 5 0 0 
Uganda mainland 34 4 7 
Uganda isles 19 3 0 
Namibia central 0 1 24 
Namibia north 62 7 4 
Puerto Rico 0 0 15 
Boyko et al. 
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Fig.3. Principal component analysis of indigenous African village dogs. (A) PCA 


with the 89 microsatellite loci (n = 152). (B) PCA with the 300 SNP loci (n = 126). 


Ugandan and Namibian populations in PC2 for indigenous 
African village dogs for both SNP and microsatellite markers 
(Fig. 3). When admixed African and American dogs were 
included, PCA, like STRUCTURE, always clustered them 
together, and the interpretation of the principal components 
became more complicated (Fig. $2). 

To clarify the relationship between the Puerto Rican and African 
dogs that clustered with the two known mixed-breed dogs geno- 
typed on the full 389 marker panel, we ran STRUCTURE on the 
300-SNP dataset with an additional 100 known breed-admixed dogs 
from the United States that were genotyped on this SNP panel (Fig. 
$3). The groupings of African dogs and the inference of non-native 
admixed individuals are highly consistent with the earlier analyses 
until K = 5, when STRUCTURE starts to detect groupings within 
the admixed individuals. The substructure found within admixed 
individuals may be a consequence of different ancestral breeds in 
different individuals; STRUCTURE analysis of the village dogs 
and dogs from 126 breeds shows that the putatively indigenous 
village dogs cluster with ancient breeds (specifically Basenjis) while 
the putatively non-native dogs cluster with modern breed groups in 
various proportions (Fig. S4). 

Fg; calculations confirm that central Namibian dogs show virtu- 
ally no genetic differentiation from American dogs (pairwise Fer 
based on SNP markers = 0.011; microsatellite F;; = 0.0025). The 


pairwise Fg; between Egyptian dogs from Giza and Luxor was also 
low (SNP F;; = 0.0024; microsatellite F; = 0.0057), whereas other 
village dog populations had pairwise Fy; values of 0.025—0.133 
(Table 2). Dogs from Kharga were the most distinct (Fs; of 
0.0735-13.3) whereas dogs from mainland Uganda and northern 
Namibia (~2,900 km apart) show only moderate differentiation 
(Fsr = 0.0237-0.0254). Heterozygosity was high across all genetic 
marker types in all village dog populations except those of the 
Kharga oasis and the Kome islands and low in all of the breed dogs 
(Table 3). 


Origin of Putatively African Breeds. We included individuals from 
five breeds with presumed African or Middle Eastern ancestry in 
our principal component analyses to see whether this approach 
could detect which sampled village dog populations are closest to 
the founding population for each breed. For the SNP loci, PC1 and 
PC2 differentiated three breed groups—Basenjis, Salukis/Afghan 
hounds, and Rhodesian ridgebacks/Pharaoh hounds—while village 
dogs were clustered closer to the origin (Fig. 4). Notably, the village 
dog cluster still exhibited geographical structuring with Egyptian 
village dogs lying closest to the Saluki/Afghan hound cluster, 
indigenous Namibian and Ugandan dogs lying closest to the Basenji 
cluster, and breed-admixed Namibian and American dogs lying 
closest to the Rhodesian ridgeback/Pharaoh hound cluster. PCA of 
the microsatellite loci revealed the same clustering affinities (Egyp- 
tian village dogs nearest to Salukis/Afghan hounds, etc.) as the SNP 
PCA although the breed clusters were less well defined (Fig. S5). 


Analysis of Mitochondrial Diversity. We sequenced 680 bp of the 
mitochondrial D-loop, including the 582-bp region described in ref. 
6. We found 47 haplotypes in the African dogs as well as 9 hap- 
lotypes in the Puerto Rican dogs, two of which were also found in 
the sampled United States mixed breed dogs (see Table S1 and 
Table S3). All haplotypes were in the A (33 African haplotypes), B 
(6 African haplotypes), or C (8 African haplotypes) clades (Fig. S6), 
the clades that are believed to contain >95% of domestic dogs (6). 
Over the region sequenced in (6) and ignoring indels, we found 18 
African haplotypes that were not described by (6); 14 in A clade 
[one of which was found in Africa by (10)], one in B clade, and three 
in C clade. The Puerto Rican and United States mixed-breed dogs 
had 8 A clade and one B clade haplotypes (only one haplotype, a 
Puerto Rican A clade haplotype, ws not previously described in ref. 
6). 

Surprisingly, local mtDNA diversity did not differ systematically 
between African regions and similarly sized regions in East Asia, 
the purported origin of domestic dogs. Across the 582-bp region 
analyzed in refs. 6 and 10, and this study, the number of haplotypes 
observed in a region closely matches the neutral expectation (Fig. 
5). Differences in regional haplotype diversity appear to be driven 
by sampling artifacts rather than by distance from an hypothetical 
domestication origin, with the highly sampled and fractionated 
subpopulations of Japan exhibiting the most diversity, and nearby 
Sichuan (China) probably exhibiting the least (Fig. 5). Neither 
Africa nor East Asia appears to contain private haplogroups 


Table 2. Pairwise F,; in village dogs between regions based on 300 SNPs 


Giza Kharga Luxor NA_cent NA_north UG _isles UG_main America 
Giza _— 
Kharga 8.98% = 
Luxor 0.62% 8.31% — 
NA_cent 3.48% 12.52% 5.92% _ 
NA_north 3.87% 10.91% 4.75% 4.75% _— 
UG_isles 4.90% 13.27% 6.28% 4.70% 4.93% — 
UG_main 3.38% 11.78% 4.95% 4.44% 2.54% 3.75% _— 
America 2.70% 12.86% 5.15% 1.14% 4.97% 5.00% 4.51% _— 
Boyko et al. PNAS | August 18,2009 | vol.106 | no.33 | 13905 
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Table 3. Gene diversity (expected heterozygosity) at 89 microsatellite markers, 300 SNP markers, and the mitochondrial D-loop in 


African village dogs and five breeds 


microsatellites SNPs mtDNA 

all dogs indigenous all dogs indigenous all dogs 
Egypt (Giza) 0.677 (11) 0.684 (8) 0.438 (11) 0.438 (8) 0.890 (11) 
Egypt (Luxor) 0.666 (25) 0.664 (24) 0.419 (25) 0.417 (24) 0.936 (26) 
Egypt (Kharga) 0.553 (5) 0.553 (5) 0.360 (5) 0.360 (5) 0.427 (5) 
Uganda (mainland) 0.669 (43) 0.660 (30) 0.432 (19) 0.424 (16) 0.901 (118) 
Uganda (isles) 0.633 (20) 0.630 (16) 0.435 (20) 0.429 (16) 0.858 (30) 
Namibia (north) 0.638 (71) 0.631 (60) 0.429 (61) 0.415 (50) 0.929 (91) 
Namibia (central) 0.637 (25) 0 0.466 (18) 0 0.916 (28) 
America 0.648 (17) 0 0.459 (18) 0 0.909 (17) 
Afghan Hound 0.333 (5) 0.317 (18) 
Basenji 0.356 (5) 0.184 (19) 
Pharaoh Hound 0.217 (4) 0.283 (16) 
Rhodesian Ridgeback 0.353 (5) 0.368 (28) 
Saluki 0.330 (5) 0.355 (24) 


Sample sizes are given in parentheses. 


(haplotypes that are highly differentiated from those found on other 
continents; Fig. S6). 


Discussion 


This study analyzed a large number of genetic markers to charac- 
terize the level of non-native admixture in a geographically wide- 
spread set of semiferal village dog populations. African village dogs 
exhibit complex population structure because of the effects of 
geography, gene flow barriers, and the presence of non-indigenous 
dogs in some populations. Notably, the vast majority of the African 
village dogs could be classified as indigenous (<25% non-African 
ancestry) or non-native (>60% non-African ancestry), with only 
7% showing intermediate levels of African ancestry (Table 1). 
Classification of individuals as indigenous versus non-native was 
consistent between runs, and remained consistent even when the 
number of mixed-breed dogs included in the analysis was substan- 
tially increased (Fig. S3). 

With two exceptions, African village dogs did not exhibit a 
region-specific level of non-African admixture, but rather con- 
tained dogs with completely indigenous ancestry (or nearly so) that 
were often intermingling with a few highly admixed individuals. The 
lack of consistent levels of admixture within regions suggests that 
non-indigenous dog genes are quickly removed from village dog 
populations, or that admixture with non-indigenous dogs is a very 


recent phenomenon in these areas. The two exceptions were central 
Namibia, where every dog had significant levels of non-indigenous 
admixture (see below), and Giza, where all dogs showed some, 
usually low, level of admixture. This background level of admixture 
in Giza could reflect older mixing with breed dogs around this 
ancient city, or it could simply reflect the relative proximity of Giza 
to Eurasia, the ancestral home of most modern breed dogs. 
STRUCTURE analyses including dogs from 126 breeds suggest it 
is the latter—Egyptian dogs cluster partially with ancient (mostly 
Asian) breeds and the sub-Saharan (Basenji + village dog) cluster 
and do not appear to cluster significantly with any of the (mostly 
European) modern breed groups (Fig. S4). 

Dispersal barriers significantly affected population structure. 
The 230 km of desert separating the Kharga oasis from Luxor led 
to much stronger population differentiation (Fsr = 0.084) than the 
500 km Nile corridor between Luxor and Giza (Fsr = 0.0024). 
Likewise, the Kome islands which lie 10-20 km from the mainland 
in Lake Victoria were much more differentiated from mainland 
Uganda than were northern Namibian populations 2,900 km away 
(Fsr = 0.051 vs. Fsr = 0.033). Most surprising, the 20-100 km 
distance between northern and central Namibian populations that 
coincided with that country’s Red Line veterinary cordon fence 
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African and Middle Eastern breeds across 300 SNP markers in 186 village dog and 
105 breed dogs. 
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Fig.5. Number of haplotypes (excluding indels) versus number of dogs sampled 


within Africa and East Asian geographic regions. Note log scale of x axis. East 
Asian samples from (6); African samples from this study or by (10). See Table $4 for 
a list of the areas used to construct this figure. The blue line depicts the expected 
number of haplotypes from Ewens’s sampling formula (29), which assumes an 
infinite alleles model; E(K)=>0og ON j+ 9). Using Levenberg-Marquardt nonlinear 
regression, we estimate 6 to be 8.654 (95% C.l. = [7.41, 9.89]). 
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represented a stark population boundary—dogs north of the cor- 
don averaged 87% indigenous African ancestry while those south 
of the cordon were only 9% African. The cordon has separated the 
indigenous human populations (to the north) from white settlement 
areas (to the south) for the last 100 years and is currently used to 
restrict livestock (but not humans or dogs) from crossing southward 
(13). During this time, indigenous dogs have apparently been 
extirpated from central Namibia, and the selective pressures on 
dogs in each region must be strong and disparate enough to 
maintain a sharp genetic boundary along this porous chain-link 
fence. That Puerto Rico also seems to contain few, if any, indige- 
nous dogs highlights the degree to which colonization history 
affects dog populations. 

STRUCTURE and principal component analysis revealed strik- 
ingly similar patterns of genetic variation—indigenous Africian 
dogs clearly clustered by country and away from non-indigenous 
dogs in each analysis (Figs. 2-4). PCA showed slight differences 
between the SNP and microsatellite results: SNP but not micro- 
satellite markers led to PC1 separating out dogs based on admixture 
(Fig. S2), although PCA with only indigenous African dogs resulted 
in the same axes of variation in both sets (Fig. 3). Breeds were 
clustered more cleanly with the SNP dataset than the microsatellite 
dataset, although this result could be an effect of the larger number 
of breed dogs that were typed on the SNP panel rather than a 
consequence of using SNPs versus microsatellites per se (Fig. 4 and 
Fig. S5). Nevertheless, both marker sets clustered Salukis and 
Afghan hounds nearest to Egyptian village dogs and Basenjis 
nearest to indigenous Ugandan and Namibian dogs, as expected by 
each breed’s history. In contrast, Rhodesian ridgebacks and Pha- 
raoh hounds clustered nearest to admixed dogs, suggesting these 
breeds have been recreated from admixture with non-African dogs. 
These results are consistent with the STRUCTURE results from 
(15, 16), showing that Salukis, Afghan hounds, and Basenjis cluster 
with ancient, non-European breeds, while Pharaoh hounds and 
Rhodesian ridgebacks do not. Although this coarse sampling (3 
countries) is suitable for detecting truly indigenous versus recon- 
stituted ancestry in putatively African breeds, analysis including 
village dogs from more regions will be necessary to better localize 
the ancestral origins of these breeds. 

Village dog populations had higher levels of diversity than 
purebred dogs across all markers (see (17) for purebred mtDNA 
diversity estimates), although for SNP markers, non-native/admixed 
dogs had even higher diversity estimates. The high heterozygosity 
found in breed-admixed dogs is likely because of SNP ascertain- 
ment; by preferentially genotyping SNPs that are highly polymor- 
phic in breed dogs, inferences based on SNP diversity in village dogs 
may be biased. Microsatellite ascertainment bias is less likely to 
have this effect since even microsatellites that are highly polymor- 
phic in breeds can exhibit new alleles when genotyped in other 
populations. This suggests that careful control of ascertainment, or 
a denser SNP marker set that enables haplotype-based inference, is 
desirable for SNP markers. However, the high degree of concor- 
dance of SNP and microsatellite markers in both PCA and STRUC- 
TURE analyses shows that these methods are robust to these 
effects. 

African village dogs exhibited a similar level of mitochondrial 
D-loop diversity to that of the dogs sampled by (6) in East Asia, the 
putative site of dog domestication. Although we do not suggest that 
Africa is actually the site of dog domestication, we do believe that 
an East Asian origin of dogs should be further scrutinized, espe- 
cially as Africa also has numerous private haplotypes and East Asia 
has no private haplogroups, with the possible exception of clade E, 
which is poorly represented numerically (1 haplotype, 3 individuals) 
and is rather similar to clade C. The data appear consistent with a 
rapid spread of dogs after original domestication and high effective 
population sizes and gene flow between continents, as there is no 
clear signal of decreasing haplotype diversity away from any origin. 
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Interestingly, Ugandan and northern Namibian populations that 
appear relatively undifferentiated using nuclear markers also have 
large overlap in their mitochondrial sequences. Thus, long-distance 
gene flow may be occurring, leading to a lower total number of 
haplotypes in these areas, whereas areas in Egypt with less chance 
for gene flow between them may harbor more diversity in the 
aggregate. This underscores the need to design a sampling and 
interpretation scheme to compare populations as opposed to coarse 
geographic areas. These areas could have features such as islands 
and deserts that may increase the number of haplotypes found only 
because one is sampling multiple populations. 

Besides the discovery of 18 haplotypes, we have also expanded 
the geographic range of some previously reported dog mtDNA 
haplotypes. For example, we found haplotype A29, the predomi- 
nant mtDNA haplotype of Australian dingoes, in a Puerto Rican 
dog even though this haplotype has never been reported in a dog 
outside of East Asia or the American Arctic (18). Either Puerto 
Rican dogs descend from some non-European (probably Asian) 
dogs that still carry this haplotype, or this is an indigenous New 
World haplotype that has persisted in Puerto Rico despite wide- 
spread historical European admixture. 

Our results clearly demonstrate the need for further research 
with indigenous village dogs. Indigenous dog populations can be 
largely eliminated, as in Puerto Rico and central Namibia, by 
European colonization, and it is unclear the degree to which other 
populations will be able to maintain their genetic identity and 
persist in the face of modernity. The dog, although certainly a 
species uniquely suited as a model organism for genomics, can also 
serve as an invaluable organism for comparative studies of evolu- 
tion and adaptation. Like other domesticated animals (e.g., cats, 
horses, and pigeons), dogs consist of breeds intensely selected for 
specific traits and feral populations that have been left to adapt to 
local conditions with “random” breeding. Dense genotyping and 
resequencing in these species should reveal genes underlying do- 
mestication in random-bred populations, instead of just those that 
have been under strong artificial selection in breed animals, and 
whether the relaxation of selective constraint observed in these 
species (19) is a product of recent breeding practices or domesti- 
cation per se. Resequencing in indigenous village dogs will also be 
necessary to obtain markers free of ascertainment bias to estimate 
the amount of genetic variation in dogs that is absent in existing 
modern breeds, and the degree to which present-day indigenous 
village dogs represent populations that have been randomly breed- 
ing since dog domestication versus remnants of ancient, indigenous 
breeds. 

Mitochondrial sequencing alone does not seem well-suited to 
determining the timing and location of domestication. Dog mito- 
chondrial haplogroups seem more or less cosmopolitan, and infer- 
ences based on mtDNA diversity statistics can be easily skewed by 
sampling effort and misled by the inability to distinguish indigenous 
from non-native dogs. In the absence of finding multiple highly 
diverged and highly localized mitochondrial haplogroups, genome- 
wide autosomal markers will be needed to unravel the story of the 
first domesticated species. 


Materials and Methods 


Sampling Protocol. Dogs were sampled from animal shelters or were brought to 
the researchers for sampling by owners and villagers. In accordance with Cornell 
IACUC protocol 2007-0076, 3-5 mL of blood drawn from the cephalic or lateral 
saphenous vein into K2-EDTA blood collection tubes. At the field site, blood cells 
were lysed with an ammonium chloride solution and spun at 1,100 x g with a 
portable centrifuge. After discarding the supernatant, cell pellets were resus- 
pended in an EDTA-Tris-SDS solution for transport to the DNA Bank at Cornell 
Baker Institute for Animal Health. DNA was isolated from the lysate using 
ammonium acetate and alcohol and was suspended in Tris-EDTA buffer. Con- 
centrations were determined by A260 on a NanoDrop ND1000 spectrophotom- 
eter. Stock DNA was stored in —20 °C freezers by the Cornell Medical Genetics 
Archive. Dilutions were made from a 200 uwg/mL working stock as needed for 
sequencing and genotyping. A similar protocol was followed for the 102 United 
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States dogs, except that we also verified that they were mixtures of several 
different breeds by using the Wisdom MX breed test (Mars Inc.). 


Microsatellite Genotyping. Two hundred twenty-seven village dogs were typed 
on a 96-microsatellite panel described in (15, 16). Microsatellites were amplified 
individually in the presence of a fluorescently labeled universal primer and were 
combined post-PCR into sets of 1 to 4 markers for capillary electrophoresis on an 
ABI3730xI (ABI). Standard PCR conditions have been described in ref. 15 while 
adjustments made to individual markers are listed in Table $5. Each 96-well plate 
of samples included a previously genotyped control sample for size verification 
and binned using GeneMapper 4.0. All genotype calls were checked manually 
and markers were scanned individually for the appearance of new alleles outside 
the existing bins. After genotyping, 7 markers were excluded on the basis of high 
missing rates (>20%) or heterozygote deficits (P < 0.01) in a majority of the 8 
regional populations because this suggests the presence of null alleles at these 
loci. These data were combined with dogs from 126 breeds previously genotyped 
for breed structure studies (15, 16). 


SNP Genotyping. One hundred sixty-eight village dogs, 102 mixed-breed dogs, 
and dogs from 126 breeds were genotyped using the sequenom iPLEX platform 
on a 321-SNP panel described in ref. 20. For each sample, 2 wL of dog genomic 
DNA was aliquoted into 13 separate microtiter wells for PCR amplification. Each 
genomic aliquot was amplified in a total volume of 10 wl >45 cycles with up to 
28 primer pairs. Each reaction was treated with shrimp alkaline phosphatase for 
40 min before heat inactivation. Primer extension reactions were carried out ina 
standard thermocycler according to the sequenom iPLEX gold protocol. Each 
reaction was desalted before spotting and shooting a SpectroChip on the Com- 
pact MassARRAY system (Sequenom). Results were interpreted automatically 
using cluster plots with the Histogram tabular view active in SpectroTyper- 
TyperAnalyzer (Sequenom). SNP genotypes were loaded into PLINK version 1.0.4 
(21) and 15 SNPs with high missingness (>20%) and 1 SNP with an extreme 
heterozygote deficiency (P < 10-7 below Hardy-Weinberg equilibrium) were 
removed from further analysis. 


Mitochondrial Sequencing. A 680-bp fragment of the mitochondrial D-loop was 
amplified in two overlapping reactions. Region-1 was amplified using forward 
primer H15422: 5'-CTCTTGCTCCACCATCAGC-3’, and reverse primer L15781: 5’- 
GTAAGAACCAGATGCCAGG-3’. Region-2 was amplified using forward primer 
H15693 5’-AATAAGGGCTTAATCACCATGC-3’ and reverse primer L16106: 5’- 
AAACTATATGTCCTGAAACC-3’ (primer names correspond to 3’ most position of 
primer, relative to the published dog mitochondrial genome as in (6)). PCR was 
carried out under the following protocol using 10 ng genomic DNA: Denatur- 
ation: 94 °C (40s); annealing: 54 °C (1 min); amplification: 72 °C (1 min) for 35 total 
cycles followed by a 5 min final annealing step at 72 °C. Sequencing reactions 
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were carried out on an ABI 3730 sequencer using BigDye Terminator chemistry 
using the Region-1 reverse primer and Region-2 forward primer. Any reads with 
ambiguous bases were rerun in the opposite direction. Sequences were edited, 
assembled, and aligned with Sequencher 4.8 (Gene Codes Corporation) and 
submitted to GenBank with Sequin (http:/Awww.ncbi.nim.nih.gow/Sequin/). 


Statistical Analyses. We used two approaches—principal component analysis 
with EIGENSOFT v2.0 (22) and clustering analysis with STRUCTURE v2.2 (14)—to 
classify individuals as indigenous or non-native and to describe the genetic 
structure of indigenous African village dogs and their relationship to dogs from 
putatively African breeds. We relied primarily on STRUCTURE to determine the 
proportion of non-African admixture present in each village dog because struc- 
ture allows for probabilistic assignment of individuals to classes and explicit 
modeling of admixture (22). In contrast, PCA makes no assumptions regarding 
discrete versus clinal population structure and is well suited for describing the 
principal axes of genetic variation between populations. In practice, STRUCTURE 
and PCA usually reveal very similar patterns of genetic variation (22). 

Before running these clustering methods, we removed markers in high LD 
with other markers [r2>0.5, see (23)] using Arlequin v3.11 (24) and removed 9 
village dogs that showed high relatedness to another dog in the genotyping 
panel (zhat > 0.3). All STRUCTURE runs were done using the admixture model 
with correlated allele frequencies, no prior population information, and default 
parameter settings with a burnin period of 100,000 iterations followed by 
500,000 MCMC repetitions, with 10 runs per K, and averaged using CLUMPP 
v1.1.2 (25). In contrast, PCA was carried out separately for the SNP and microsat- 
ellite markers. Microsatellite loci with n > 2 alleles were recorded as n-1 biallelic 
loci before running PCA in Eigensoft. 

Expected heterozygoisty (h) was calculated in Arlequin after removing 10 dogs 
that appeared to be r approximately 0.5 related. Fst based on SNP loci was computed 
with a custom C++ implementation of Eq. 6 from (26); microsatellite Fst was com- 
puted using Arlequin. Unless otherwise noted, statistical tests were performed in R 
v2.6.2 (27). STRUCTURE results were plotted using Distruct v1.1 (28). 
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